---
id: policies
sidebar_label: Policies
title: Policies
abstract: Your assistant uses policies to decide which action to take at each step in a conversation. There are machine-learning and rule-based policies that your assistant can use in tandem.
---
You can customize the policies your assistant uses by specifying the `policies`
key in your project's `config.yml`.
There are different policies to choose from, and you can include
multiple policies in a single configuration. Here's an example of
what a list of policies might look like:

```yaml-rasa title="config.yml"
language:  # your language
pipeline:
  # - <pipeline components>

policies:
  - name: MemoizationPolicy
  - name: TEDPolicy
    max_history: 5
    epochs: 200
  - name: RulePolicy
```
:::tip Starting from scratch?

If you don't know which policies to choose, leave out the `policies` key from your `config.yml` completely.
If you do, the [Suggested Config](.//model-configuration.mdx#suggested-config)
feature will provide default policies for you.

:::

## Action Selection

At every turn, each policy defined in your configuration will
predict a next action with a certain confidence level. For more information
about how each policy makes its decision, read into the policy's description below.
The policy that predicts with the highest confidence decides the assistant's next action.

:::note Maximum number of predictions
By default, your assistant can predict a maximum of 10 next actions
after each user message. To update this value,
you can set the environment variable `MAX_NUMBER_OF_PREDICTIONS`
to the desired number of maximum predictions.

:::

### Policy Priority

In the case that two policies predict with equal confidence (for example, the Memoization
and Rule Policies might both predict with confidence 1), the priority of the
policies is considered. Rasa Open Source policies have default priorities that are set to ensure the
expected outcome in the case of a tie. They look like this, where higher numbers have higher priority:

<!-- We want to have high priority policies first; it's not possible to use a Markdown ordered list for that. -->

* 6 - `RulePolicy`

* 3 - `MemoizationPolicy` or `AugmentedMemoizationPolicy`

* 2 - `UnexpecTEDIntentPolicy`

* 1 - `TEDPolicy`


In general, it is not recommended to have more
than one policy per priority level in your configuration. If you have 2 policies with the same priority and they predict
with the same confidence, the resulting action will be chosen randomly.

If you create your own policy, use these priorities as a guide for figuring out the priority of your policy.
If your policy is a machine learning policy, it should most likely have priority 1, the same as the `TEDPolicy`.

:::warning overriding policy priorities
All policy priorities are configurable via the `priority` parameter in the policy's configuration,
but we **do not recommend** changing them outside of specific cases such as custom policies.
Doing so can lead to unexpected and undesired bot behavior.

:::


## Machine Learning Policies

### TED Policy

The Transformer Embedding Dialogue (TED) Policy is
a multi-task architecture for next action prediction and entity
recognition. The architecture consists of several transformer encoders which are shared for both tasks.
A sequence of entity labels is predicted through a Conditional Random Field (CRF) tagging layer on top of the
user sequence transformer encoder output corresponding to the input sequence of tokens.
For the next action prediction, the dialogue transformer encoder output and the system action labels are embedded into a
single semantic vector space. We use the dot-product loss to maximize the similarity with the target label and
minimize similarities with negative samples.

If you want to learn more about the model, check out
[our paper](https://arxiv.org/abs/1910.00486) and on our
[youtube channel](https://www.youtube.com/watch?v=j90NvurJI4I&list=PL75e0qA87dlG-za8eLI6t0_Pbxafk-cxb&index=14&ab_channel=Rasa).
where we explain the model architecture in detail.

TED Policy architecture comprises the following steps:

1. Concatenate features for
   - user input (user intent and entities) or user text processed through a user sequence transformer encoder,
   - previous system actions or bot utterances processed through a bot sequence transformer encoder,
   - slots and active forms

   for each time step into an input vector to the embedding layer that precedes the 
   dialogue transformer.

2. Feed the embedding of the input vector into the dialogue transformer encoder.

3. Apply a dense layer to the output of the dialogue transformer to get embeddings of the dialogue for each time step.

4. Apply a dense layer to create embeddings for system actions for each time step.

5. Calculate the similarity between the dialogue embedding and embedded system actions.
   This step is based on the [StarSpace](https://arxiv.org/abs/1709.03856) idea.

6. Concatenate the token-level output of the user sequence transformer encoder
   with the output of the dialogue transformer encoder for each time step.

7. Apply CRF algorithm to predict contextual entities for each user text input.

**Configuration:**

You can pass configuration parameters to the `TEDPolicy` using the `config.yml` file.
If you want to fine-tune your model, start by modifying the following parameters:

* `epochs`:
  This parameter sets the number of times the algorithm will see the training data (default: `1`).
  One `epoch` is equals to one forward pass and one backward pass of all the training examples.
  Sometimes the model needs more epochs to properly learn.
  Sometimes more epochs don't influence the performance.
  The lower the number of epochs the faster the model is trained.
  Here is how the config would look like:

  ```yaml-rasa title="config.yml"
  policies:
  - name: TEDPolicy
    epochs: 200
  ```

* `max_history`:
  This parameter controls how much dialogue history the model looks at to decide which
  action to take next. Default `max_history` for this policy is `None`,
  which means that the complete dialogue history since session restart is taken into
  account. If you want to limit the model to only see a certain number of previous
  dialogue turns, you can set `max_history` to a finite value.
  Please note that you should pick `max_history` carefully, so that the model has enough
  previous dialogue turns to create a correct prediction.
  See [Featurizers](#featurizers) for more details.
  Here is how the config would look like:

  ```yaml-rasa title="config.yml"
  policies:
  - name: TEDPolicy
    max_history: 8
  ```

* `number_of_transformer_layers`:
  This parameter sets the number of sequence transformer encoder layers to use for
  sequential transformer encoders for user, action and action label texts and for
  dialogue transformer encoder.
  (defaults: `text: 1, action_text: 1, label_action_text: 1, dialogue: 1`).
  The number of sequence transformer encoder layers corresponds
  to the transformer blocks to use for the model.

* `transformer_size`:
  This parameter sets the number of units in the sequence transformer encoder layers to use for
  sequential transformer encoders for user, action and action label texts and for
  dialogue transformer encoder.
  (defaults: `text: 128, action_text: 128, label_action_text: 128, dialogue: 128`).
  The vectors coming out of the transformer encoders will have the given `transformer_size`.

* `weight_sparsity`:
  This parameter defines the fraction of kernel weights that are set to 0 for all feed forward layers
  in the model (default: `0.8`). The value should be a number between 0 and 1. If you set `weight_sparsity`
  to 0, no kernel weights will be set to 0, the layer acts as a standard feed forward layer. You should not
  set `weight_sparsity` to 1 as this would result in all kernel weights being 0, i.e. the model is not able
  to learn.

* `split_entities_by_comma`:
  This parameter defines whether adjacent entities separated by a comma should be treated as one, or split. For example,
  entities with the type `ingredients`, like "apple, banana" can be split into "apple" and "banana". An entity with type
  `address`, like "Schönhauser Allee 175, 10119 Berlin" should be treated as one.

  Can either be
  `True`/`False` globally:
  ```yaml-rasa title="config.yml"
  policies:
    - name: TEDPolicy
      split_entities_by_comma: True
  ```
  or set per entity type, such as:
  ```yaml-rasa title="config.yml"
  policies:
    - name: TEDPolicy
      split_entities_by_comma:
        address: False
        ingredients: True
  ```

* `constrain_similarities`:
  This parameter when set to `True` applies a sigmoid cross entropy loss over all similarity terms.
  This helps in keeping similarities between input and negative labels to smaller values.
  This should help in better generalization of the model to real world test sets.


The above configuration parameters are the ones you should configure to fit your model to your data.
However, additional parameters exist that can be adapted.

<details><summary>More configurable parameters</summary>

```
+---------------------------------------+------------------------+--------------------------------------------------------------+
| Parameter                             | Default Value          | Description                                                  |
+=======================================+========================+==============================================================+
| hidden_layers_sizes                   | text: []               | Hidden layer sizes for layers before the embedding layers    |
|                                       | action_text: []        | for user messages and bot messages in previous actions       |
|                                       | label_action_text: []  | and labels. The number of hidden layers is                   |
|                                       |                        | equal to the length of the corresponding list.               |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| dense_dimension                       | text: 128              | Dense dimension for sparse features to use after they are    |
|                                       | action_text: 128       | converted into dense features.                               |
|                                       | label_action_text: 128 |                                                              |
|                                       | intent: 20             |                                                              |
|                                       | action_name: 20        |                                                              |
|                                       | label_action_name: 20  |                                                              |
|                                       | entities: 20           |                                                              |
|                                       | slots: 20              |                                                              |
|                                       | active_loop: 20        |                                                              |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| concat_dimension                      | text: 128              | Common dimension to which sequence and sentence features of  |
|                                       | action_text: 128       | different dimensions get converted before concatenation.     |
|                                       | label_action_text: 128 |                                                              |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| encoding_dimension                    | 50                     | Dimension size of embedding vectors                          |
|                                       |                        | before the dialogue transformer encoder.                     |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| transformer_size                      | text: 128              | Number of units in user text sequence transformer encoder.   |
|                                       | action_text: 128       | Number of units in bot text sequence transformer encoder.    |
|                                       | label_action_text: 128 | Number of units in bot text sequence transformer encoder.    |
|                                       | dialogue: 128          | Number of units in dialogue transformer encoder.             |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| number_of_transformer_layers          | text: 1                | Number of layers in user text sequence transformer encoder.  |
|                                       | action_text: 1         | Number of layers in bot text sequence transformer encoder.   |
|                                       | label_action_text: 1   | Number of layers in bot text sequence transformer encoder.   |
|                                       | dialogue: 1            | Number of layers in dialogue transformer encoder.            |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| number_of_attention_heads             | 4                      | Number of self-attention heads in transformers.              |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| unidirectional_encoder                | True                   | Use a unidirectional or bidirectional encoder                |
|                                       |                        | for `text`, `action_text`, and `label_action_text`.          |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| use_key_relative_attention            | False                  | If 'True' use key relative embeddings in attention.          |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| use_value_relative_attention          | False                  | If 'True' use value relative embeddings in attention.        |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| max_relative_position                 | None                   | Maximum position for relative embeddings.                    |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| batch_size                            | [64, 256]              | Initial and final value for batch sizes.                     |
|                                       |                        | Batch size will be linearly increased for each epoch.        |
|                                       |                        | If constant `batch_size` is required, pass an int, e.g. `8`. |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| batch_strategy                        | "balanced"             | Strategy used when creating batches.                         |
|                                       |                        | Can be either 'sequence' or 'balanced'.                      |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| epochs                                | 1                      | Number of epochs to train.                                   |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| random_seed                           | None                   | Set random seed to any 'int' to get reproducible results.    |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| learning_rate                         | 0.001                  | Initial learning rate for the optimizer.                     |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| embedding_dimension                   | 20                     | Dimension size of dialogue & system action embedding vectors.|
+---------------------------------------+------------------------+--------------------------------------------------------------+
| number_of_negative_examples           | 20                     | The number of incorrect labels. The algorithm will minimize  |
|                                       |                        | their similarity to the user input during training.          |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| similarity_type                       | "auto"                 | Type of similarity measure to use, either 'auto' or 'cosine' |
|                                       |                        | or 'inner'.                                                  |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| loss_type                             | "cross_entropy"        | The type of the loss function, either 'cross_entropy'        |
|                                       |                        | or 'margin'.                                                 |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| ranking_length                        | 10                     | Number of top actions to normalize scores for. Applicable    |
|                                       |                        | only with loss type 'cross_entropy' and 'softmax'            |
|                                       |                        | confidences. Set to 0 to disable normalization.              |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| maximum_positive_similarity           | 0.8                    | Indicates how similar the algorithm should try to make       |
|                                       |                        | embedding vectors for correct labels.                        |
|                                       |                        | Should be 0.0 < ... < 1.0 for 'cosine' similarity type.      |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| maximum_negative_similarity           | -0.2                   | Maximum negative similarity for incorrect labels.            |
|                                       |                        | Should be -1.0 < ... < 1.0 for 'cosine' similarity type.     |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| use_maximum_negative_similarity       | True                   | If 'True' the algorithm only minimizes maximum similarity    |
|                                       |                        | over incorrect intent labels, used only if 'loss_type' is    |
|                                       |                        | set to 'margin'.                                             |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| scale_loss                            | True                   | Scale loss inverse proportionally to confidence of correct   |
|                                       |                        | prediction.                                                  |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| regularization_constant               | 0.001                  | The scale of regularization.                                 |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| negative_margin_scale                 | 0.8                    | The scale of how important it is to minimize the maximum     |
|                                       |                        | similarity between embeddings of different labels.           |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| drop_rate_dialogue                    | 0.1                    | Dropout rate for embedding layers of dialogue features.      |
|                                       |                        | Value should be between 0 and 1.                             |
|                                       |                        | The higher the value the higher the regularization effect.   |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| drop_rate_label                       | 0.0                    | Dropout rate for embedding layers of label features.         |
|                                       |                        | Value should be between 0 and 1.                             |
|                                       |                        | The higher the value the higher the regularization effect.   |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| drop_rate_attention                   | 0.0                    | Dropout rate for attention. Value should be between 0 and 1. |
|                                       |                        | The higher the value the higher the regularization effect.   |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| weight_sparsity                       | 0.8                    | Sparsity of the weights in dense layers.                     |
|                                       |                        | Value should be between 0 and 1.                             |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| use_sparse_input_dropout              | True                   | If 'True' apply dropout to sparse input tensors.             |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| use_dense_input_dropout               | True                   | If 'True' apply dropout to sparse features after they are    |
|                                       |                        | converted into dense features.                               |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| evaluate_every_number_of_epochs       | 20                     | How often to calculate validation accuracy.                  |
|                                       |                        | Set to '-1' to evaluate just once at the end of training.    |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| evaluate_on_number_of_examples        | 0                      | How many examples to use for hold out validation set.        |
|                                       |                        | Large values may hurt performance, e.g. model accuracy.      |
|                                       |                        | Keep at 0 if your data set contains a lot of unique examples |
|                                       |                        | of dialogue turns.                                           |
|                                       |                        | Set to 0 for no validation.                                  |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| tensorboard_log_directory             | None                   | If you want to use tensorboard to visualize training         |
|                                       |                        | metrics, set this option to a valid output directory. You    |
|                                       |                        | can view the training metrics after training in tensorboard  |
|                                       |                        | via 'tensorboard --logdir <path-to-given-directory>'.        |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| tensorboard_log_level                 | "epoch"                | Define when training metrics for tensorboard should be       |
|                                       |                        | logged. Either after every epoch ('epoch') or for every      |
|                                       |                        | training step ('batch').                                     |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| checkpoint_model                      | False                  | Save the best performing model during training. Models are   |
|                                       |                        | stored to the location specified by `--out`. Only the one    |
|                                       |                        | best model will be saved.                                    |
|                                       |                        | Requires `evaluate_on_number_of_examples > 0` and            |
|                                       |                        | `evaluate_every_number_of_epochs > 0`                        |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| e2e_confidence_threshold              | 0.5                    | The threshold that ensures that end-to-end is picked only if |
|                                       |                        | the policy is confident enough.                              |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| featurizers                           | []                     | List of featurizer names (alias names). Only features        |
|                                       |                        | coming from the listed names are used. If list is empty      |
|                                       |                        | all available features are used.                             |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| entity_recognition                    | True                   | If 'True' entity recognition is trained and entities are     |
|                                       |                        | extracted.                                                   |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| constrain_similarities                | False                  | If `True`, applies sigmoid on all similarity terms and adds  |
|                                       |                        | it to the loss function to ensure that similarity values are |
|                                       |                        | approximately bounded. Used only when `loss_type=softmax`.   |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| model_confidence                      | "softmax"              | Affects how model's confidence for each action               |
|                                       |                        | is computed. It can take two values:                         |
|                                       |                        | 1. `softmax` - Similarities between input and action         |
|                                       |                        | embeddings are post-processed with a softmax function,       |
|                                       |                        | as a result of which confidence for all labels sum up to 1.  |
|                                       |                        | 2. `linear_norm` - Linearly normalized dot product similarity|
|                                       |                        | between input and action embeddings. Confidence for each     |
|                                       |                        | label is in an unbounded range.                              |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| BILOU_flag                            | True                   | If 'True', additional BILOU tags are added to entity labels. |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| split_entities_by_comma               | True                   | Splits a list of extracted entities by comma to treat each   |
|                                       |                        | one of them as a single entity. Can either be `True`/`False` |
|                                       |                        | globally, or set per entity type, such as:                   |
|                                       |                        | ```                                                          |
|                                       |                        | - name: TEDPolicy                                            |
|                                       |                        |   split_entities_by_comma:                                   |
|                                       |                        |     address: True                                            |
|                                       |                        | ```                                                          |
+---------------------------------------+------------------------+--------------------------------------------------------------+
```

:::note
The parameter `maximum_negative_similarity` is set to a negative value to mimic the original
starspace algorithm in the case `maximum_negative_similarity = maximum_positive_similarity` and
`use_maximum_negative_similarity = False`. See [starspace paper](https://arxiv.org/abs/1709.03856)
for details.

:::

</details>


### UnexpecTED Intent Policy

:::caution
This feature is experimental.
We introduce experimental features to get feedback from our community, so we encourage you to try it out!
However, the functionality might be changed or removed in the future.
If you have feedback (positive or negative) please share it with us on the [Rasa Forum](https://forum.rasa.com).

:::

`UnexpecTEDIntentPolicy` helps you review conversations and also allows your bot to react
to unlikely user turns. It is an auxiliary policy that should only be used in
conjunction with at least one other policy, as the only action that it can trigger
is the special [`action_unlikely_intent`](./default-actions.mdx#action_unlikely_intent) action.

`UnexpecTEDIntentPolicy` has the same model architecture as [`TEDPolicy`](./policies.mdx#ted-policy).
The difference is at a task level. Instead of learning the best action to be triggered next,
`UnexpecTEDIntentPolicy` learns the set of intents that are most likely to be expressed by the user
given the conversation context from training stories. It uses the learned information at inference time by
checking if the predicted intent by NLU is the most likely intent. If the intent predicted
by NLU is indeed likely to occur given the conversation context, `UnexpecTEDIntentPolicy` does not trigger
any action. Otherwise, it triggers an [`action_unlikely_intent`](./default-actions.mdx#action_unlikely_intent)
 with a confidence of `1.00`.

`UnexpecTEDIntentPolicy` should be viewed as an aid for `TEDPolicy`. Since, `TEDPolicy` is expected to improve
with better coverage of unique conversation paths that the assistant is expected to handle in the training data,
`UnexpecTEDIntentPolicy` helps to surface these unique conversation paths from past conversations. For example, if you had
the following story in your training data:

```yaml-rasa title="stories.yml"
stories:
- story: book_restaurant_table
  steps:
  - intent: request_restaurant
  - action: restaurant_form
  - active_loop: restaurant_form
  - action: restaurant_form
  - active_loop: null
  - slot_was_set:
    - requested_slot: null
```

but an actual conversation might encounter interjections inside the form which you haven't accounted for:

```yaml-rasa {11-14} title="stories.yml"
stories:
- story: actual_conversation
  steps:
  - user: |
        I'm looking for a restaurant.
    intent: request_restaurant
  - action: restaurant_form
  - active_loop: restaurant_form
  - slot_was_set:
    - requested_slot: cuisine
  - user: |
        Does it matter? I want to be quick.
    intent: deny
```

As soon as the `deny` intent gets triggered, the policy handling the form will keep requesting for the `cuisine` slot
to be filled, as the training stories don't say that this case should be treated differently.
To help you identify that a special story that handles the user's `deny` intent might be missing at this point,
 `UnexpecTEDIntentPolicy` can trigger an `action_unlikely_intent` action right after `deny` intent.
 Subsequently, you can improve your assistant by adding a new training story that handles this particular case.

To reduce false warnings, `UnexpecTEDIntentPolicy` has two mechanisms in place at inference time:

1. `UnexpecTEDIntentPolicy`'s [priority](./policies.mdx#policy-priority) is intentionally kept lower than all
[rule based policies](./policies.mdx#rule-based-policies) since rules may exist for situations that are novel for
`TEDPolicy` or `UnexpecTEDIntentPolicy`.

2. `UnexpecTEDIntentPolicy` does not predict an `action_unlikely_intent` if the last predicted intent
isn't present in any of the training stories, which might happen if an intent is only used in rules.

#### Prediction of `action_unlikely_intent`

`UnexpecTEDIntentPolicy` is invoked immediately after a user utterance and can either
trigger `action_unlikely_intent` or abstain (in which case other policies will predict actions).
To determine if `action_unlikely_intent` should be triggered, `UnexpecTEDIntentPolicy` computes a score
 for the user's intent in the current dialogue context and checks if this score is below a
 certain threshold score.

 This threshold score is computed by collecting the ML model's output on many "negative examples".
 These negative examples are combinations of dialogue contexts and user
 intents that are _incorrect_. `UnexpecTEDIntentPolicy` generates these negative examples from your
 training data by picking a random story part and pairing it with a random intent that doesn't
 occur at this point. For example, if you had just one training story:

 ```rasa-yaml title="stories.yml"
 version: 2.0
 stories:
 - story: happy path 1
   steps:
   - intent: greet
   - action: utter_greet
   - intent: mood_great
   - action: utter_goodbye
 ```

 and an intent `affirm`, then a valid negative example will be:

 ```rasa-yaml {7} title="negative_stories.yml"
 version: 2.0
 stories:
 - story: negative example with affirm unexpected
   steps:
   - intent: greet
   - action: utter_greet
   - intent: affirm
 ```

 Here, `affirm` intent is unexpected as it doesn't occur in this particular conversation context across all training stories.
 For each intent, `UnexpecTEDIntentPolicy` uses these negative examples to figure out the range of scores the model
 predicts. The threshold score is picked from this range of scores in such a way that the predicted score for a
 certain percentage of negative examples is higher than the threshold score and hence `action_unlikely_intent`
 is not triggered for them. This percentage of negative examples can be controlled by the `tolerance` parameter.
 The higher the `tolerance`, the lower the intent's score (the more unlikely the intent) needs to be
 before `UnexpecTEDIntentPolicy` triggers the `action_unlikely_intent` action.

**Configuration:**

You can pass configuration parameters to the `UnexpecTEDIntentPolicy` using the `config.yml` file.
If you want to fine-tune model's performance, start by modifying the following parameters:

* `epochs`:
  This parameter sets the number of times the algorithm will see the training data (default: `1`).
  One `epoch` is equals to one forward pass and one backward pass of all the training examples.
  Sometimes the model needs more epochs to learn properly.
  Sometimes more epochs don't influence the performance.
  The lower the number of epochs the faster the model is trained.
  Here is how the config would look like:

  ```yaml-rasa title="config.yml"
  policies:
  - name: UnexpecTEDIntentPolicy
    epochs: 200
  ```

* `max_history`:
  This parameter controls how much dialogue history the model looks at before making an inference.
  Default `max_history` for this policy is `None`, which means that the complete dialogue history
  since session (re)start is taken into account. If you want to limit the model
  to only see a certain number of previous
  dialogue turns, you can set `max_history` to a finite value.
  Please note that you should pick `max_history` carefully, so that the model has enough
  previous dialogue turns to create a correct prediction.
  Depending on your dataset, higher values of `max_history` can result in more frequent prediction of `action_unlikely_intent`
  as the number of unique possible conversation paths increases as more dialogue context is taken
  into account. Similarly, lowering the value of `max_history` can result in `action_unlikely_intent` being
  triggered less often but can also be a stronger indicator that the corresponding conversation path
  is highly unique and hence unexpected.
  We recommend you to set the `max_history` of `UnexpecTEDIntentPolicy` equal to that of `TEDPolicy`.
  Here is how the config would look like:

  ```yaml-rasa title="config.yml"
  policies:
  - name: UnexpecTEDIntentPolicy
    max_history: 8
  ```

* `ignore_intents_list`:
    This parameter lets you configure `UnexpecTEDIntentPolicy` to not predict `action_unlikely_intent` for
    a subset of intents. You might want to do this if you come across a certain list of intents for which there
    are too many false warnings generated.

* `tolerance`:
    The `tolerance` parameter is a number that ranges from `0.0` to `1.0` (inclusive).
    It helps to adjust the threshold score used during
    [prediction of `action_unlikely_intent`](./policies.mdx#prediction-of-action_unlikely_intent)
    at inference time.

    Here, `0.0` means that the threshold score will be adjusted in such a way that `0%` of negative
    examples encountered during training are predicted with a score lower than the threshold score.
    Hence, conversation contexts from all negative examples will trigger an `action_unlikely_intent` action.

    A tolerance of `0.1` means that the threshold score will be adjusted in a way such that 10% of negative
    examples encountered during training are predicted with a score lower than the threshold score.

    A tolerance of `1.0` means that the threshold score is so low that `UnexpecTEDIntentPolicy` would not
    trigger `action_unlikely_intent` for any of the negative examples that it has encountered
    during training.

The above configuration parameters are the ones you should try tweaking according to your use case and training data.
However, additional parameters exist that you could adapt.

<details><summary>More configurable parameters</summary>

```
+---------------------------------------+------------------------+--------------------------------------------------------------+
| Parameter                             | Default Value          | Description                                                  |
+=======================================+========================+==============================================================+
| hidden_layers_sizes                   | text: []               | Hidden layer sizes for layers before the embedding layers    |
|                                       |                        | for user messages and bot messages in previous actions       |
|                                       |                        | and labels. The number of hidden layers is                   |
|                                       |                        | equal to the length of the corresponding list.               |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| dense_dimension                       | text: 128              | Dense dimension for sparse features to use after they are    |
|                                       | intent: 20             | converted into dense features.                               |
|                                       | action_name: 20        |                                                              |
|                                       | label_intent: 20       |                                                              |
|                                       | entities: 20           |                                                              |
|                                       | slots: 20              |                                                              |
|                                       | active_loop: 20        |                                                              |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| concat_dimension                      | text: 128              | Common dimension to which sequence and sentence features of  |
|                                       |                        | different dimensions get converted before concatenation.     |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| encoding_dimension                    | 50                     | Dimension size of embedding vectors                          |
|                                       |                        | before the dialogue transformer encoder.                     |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| transformer_size                      | text: 128              | Number of units in user text sequence transformer encoder.   |
|                                       | dialogue: 128          | Number of units in dialogue transformer encoder.             |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| number_of_transformer_layers          | text: 1                | Number of layers in user text sequence transformer encoder.  |
|                                       | dialogue: 1            | Number of layers in dialogue transformer encoder.            |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| number_of_attention_heads             | 4                      | Number of self-attention heads in transformers.              |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| unidirectional_encoder                | True                   | Use a unidirectional or bidirectional encoder                |
|                                       |                        | for `text`, `action_text`, and `label_action_text`.          |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| use_key_relative_attention            | False                  | If 'True' use key relative embeddings in attention.          |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| use_value_relative_attention          | False                  | If 'True' use value relative embeddings in attention.        |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| max_relative_position                 | None                   | Maximum position for relative embeddings.                    |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| batch_size                            | [64, 256]              | Initial and final value for batch sizes.                     |
|                                       |                        | Batch size will be linearly increased for each epoch.        |
|                                       |                        | If constant `batch_size` is required, pass an int, e.g. `8`. |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| batch_strategy                        | "balanced"             | Strategy used when creating batches.                         |
|                                       |                        | Can be either 'sequence' or 'balanced'.                      |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| epochs                                | 1                      | Number of epochs to train.                                   |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| random_seed                           | None                   | Set random seed to any 'int' to get reproducible results.    |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| learning_rate                         | 0.001                  | Initial learning rate for the optimizer.                     |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| embedding_dimension                   | 20                     | Dimension size of dialogue & system action embedding vectors.|
+---------------------------------------+------------------------+--------------------------------------------------------------+
| number_of_negative_examples           | 20                     | The number of incorrect labels. The algorithm will minimize  |
|                                       |                        | their similarity to the user input during training.          |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| ranking_length                        | 10                     | Number of top actions to normalize scores for. Applicable    |
|                                       |                        | only with loss type 'cross_entropy' and 'softmax'            |
|                                       |                        | confidences. Set to 0 to disable normalization.              |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| scale_loss                            | True                   | Scale loss inverse proportionally to confidence of correct   |
|                                       |                        | prediction.                                                  |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| regularization_constant               | 0.001                  | The scale of regularization.                                 |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| drop_rate_dialogue                    | 0.1                    | Dropout rate for embedding layers of dialogue features.      |
|                                       |                        | Value should be between 0 and 1.                             |
|                                       |                        | The higher the value the higher the regularization effect.   |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| drop_rate_label                       | 0.0                    | Dropout rate for embedding layers of label features.         |
|                                       |                        | Value should be between 0 and 1.                             |
|                                       |                        | The higher the value the higher the regularization effect.   |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| drop_rate_attention                   | 0.0                    | Dropout rate for attention. Value should be between 0 and 1. |
|                                       |                        | The higher the value the higher the regularization effect.   |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| connection_density                    | 0.8                    | Sparsity of the weights in dense layers.                     |
|                                       |                        | Value should be between 0 and 1.                             |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| use_sparse_input_dropout              | True                   | If 'True' apply dropout to sparse input tensors.             |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| use_dense_input_dropout               | True                   | If 'True' apply dropout to sparse features after they are    |
|                                       |                        | converted into dense features.                               |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| evaluate_every_number_of_epochs       | 20                     | How often to calculate validation accuracy.                  |
|                                       |                        | Set to '-1' to evaluate just once at the end of training.    |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| evaluate_on_number_of_examples        | 0                      | How many examples to use for hold out validation set.        |
|                                       |                        | Large values may hurt performance, e.g. model accuracy.      |
|                                       |                        | Keep at 0 if your data set contains a lot of unique examples |
|                                       |                        | of dialogue turns.                                           |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| tensorboard_log_directory             | None                   | If you want to use tensorboard to visualize training         |
|                                       |                        | metrics, set this option to a valid output directory. You    |
|                                       |                        | can view the training metrics after training in tensorboard  |
|                                       |                        | via 'tensorboard --logdir <path-to-given-directory>'.        |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| tensorboard_log_level                 | "epoch"                | Define when training metrics for tensorboard should be       |
|                                       |                        | logged. Either after every epoch ('epoch') or for every      |
|                                       |                        | training step ('batch').                                     |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| checkpoint_model                      | False                  | Save the best performing model during training. Models are   |
|                                       |                        | stored to the location specified by `--out`. Only the one    |
|                                       |                        | best model will be saved.                                    |
|                                       |                        | Requires `evaluate_on_number_of_examples > 0` and            |
|                                       |                        | `evaluate_every_number_of_epochs > 0`                        |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| featurizers                           | []                     | List of featurizer names (alias names). Only features        |
|                                       |                        | coming from the listed names are used. If list is empty      |
|                                       |                        | all available features are used.                             |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| ignore_intents_list                   | []                     | This parameter lets you configure `UnexpecTEDIntentPolicy` to ignore|
|                                       |                        | the prediction of `action_unlikely_intent` for a subset of   |
|                                       |                        | intents. You might want to do this if you come across a      |
|                                       |                        | certain list of intents for which there are too many false   |
|                                       |                        | warnings generated.                                          |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| tolerance                             | 0.0                    | The `tolerance` parameter is a number that ranges from `0.0` |
|                                       |                        | to `1.0` (inclusive). It helps to adjust the threshold score |
|                                       |                        | used during prediction of `action_unlikely_intent` at        |
|                                       |                        | inference time. Here, `0.0` means that the score threshold   |
|                                       |                        | is the one that `UnexpecTEDIntentPolicy` had determined at training |
|                                       |                        | time. A tolerance of `1.0` means that the threshold score    |
|                                       |                        | is so low that `IntentTED` would not trigger                 |
|                                       |                        | `action_unlikely_intent` for any of the "negative examples"  |
|                                       |                        | that it has encountered during training. These negative      |
|                                       |                        | examples are combinations of dialogue contexts and user      |
|                                       |                        | intents that are _incorrect_. `UnexpecTEDIntentPolicy` generates    |
|                                       |                        | these negative examples from your training data by picking a |
|                                       |                        | random story part and pairing it with a random intent that   |
|                                       |                        | doesn't occur at this point.                                 |
+---------------------------------------+------------------------+--------------------------------------------------------------+
```

</details>

#### Tuning the tolerance parameter

When [reviewing real conversations](./conversation-driven-development.mdx#review), we encourage you
to tune the `tolerance` parameter in `UnexpecTEDIntentPolicy`'s configuration to reduce the number
of false warnings (intents that actually are likely given the conversation context).
As you increase the value of `tolerance` from `0` to `1` in steps of `0.05`,
the number of false warnings should decrease. However, increasing the `tolerance` will
also result in fewer triggers of `action_unlikely_intent` and hence more conversation
paths not present in training stories will be missing in the set of flagged conversations.
If you change the `max_history` value and retrain a model, you might have to re-adjust the `tolerance` value as well.

:::note
`UnexpecTEDIntentPolicy` is only trained on [stories](./stories.mdx) and not [rules](./rules.mdx) from the training data.

:::

### Memoization Policy

The `MemoizationPolicy` remembers the stories from your
training data. It checks if the current conversation matches the stories in your
`stories.yml` file. If so, it will predict the next action from the matching
stories of your training data with a confidence of `1.0`. If no matching conversation
is found, the policy predicts `None` with confidence `0.0`.

When looking for a match in your training data, the policy will take the last
`max_history` number of turns of the conversation into account.
One “turn” includes the message sent by the user and any actions the
assistant performed before waiting for the next message.

You can configure the number of turns the `MemoizationPolicy` should use in your
configuration:
```yaml title="config.yml"
policies:
  - name: "MemoizationPolicy"
    max_history: 3
```


### Augmented Memoization Policy

The `AugmentedMemoizationPolicy` remembers examples from training
stories for up to `max_history` turns, just like the `MemoizationPolicy`.
Additionally, it has a forgetting mechanism that will forget a certain amount
of steps in the conversation history and try to find a match in your stories
with the reduced history. It predicts the next action with confidence `1.0`
if a match is found, otherwise it predicts `None` with confidence `0.0`.

:::note Slots and predictions
If you have dialogues where some slots that are set during
prediction time might not be set in training stories (e.g. in training
stories starting with a [reminder](./reaching-out-to-user.mdx#reminders), not all previous slots are set),
make sure to add the relevant stories without slots to your training
data as well.

:::


## Rule-based Policies

### Rule Policy

The `RulePolicy` is a policy that handles conversation parts that follow
a fixed behavior (e.g. business logic). It makes predictions based on
any `rules` you have in your training data. See the
[Rules documentation](./rules.mdx) for further information on how to define rules.

The `RulePolicy` has the following configuration options:

```yaml title="config.yml"
policies:
  - name: "RulePolicy"
    core_fallback_threshold: 0.3
    core_fallback_action_name: action_default_fallback
    enable_fallback_prediction: true
    restrict_rules: true
    check_for_contradictions: true
```

* `core_fallback_threshold` (default: `0.3`): Please see the
   [fallback documentation](fallback-handoff.mdx#handling-low-action-confidence) for
   further information.
* `core_fallback_action_name` (default: `action_default_fallback`): Please see the
   [fallback documentation](fallback-handoff.mdx#handling-low-action-confidence) for
   further information.
* `enable_fallback_prediction` (default: `true`): Please see the
   [fallback documentation](fallback-handoff.mdx#handling-low-action-confidence) for
   further information.
* `check_for_contradictions` (default: `true`):
   Before training, the RulePolicy will perform a check to make sure that
   slots and active loops set by actions are defined consistently for all rules.
   The following snippet contains an example of an incomplete rule:

   ```yaml-rasa
   rules:
   - rule: complete rule
     steps:
     - intent: search_venues
     - action: action_search_venues
     - slot_was_set:
       - venues: [{"name": "Big Arena", "reviews": 4.5}]

   - rule: incomplete rule
     steps:
     - intent: search_venues
     - action: action_search_venues
   ```

   In the second `incomplete rule`, `action_search_venues` should set
   the `venues` slot because it is set in `complete rule`, but this event is missing.
   There are several possible ways to fix this rule.

   In the case when `action_search_venues` can't find
   a venue and the `venues` slot should not be set,
   you should explicitly set the value of the slot to `null`.
   In the following story `RulePolicy` will predict `utter_venues_not_found`
   only if the slot `venues` is not set:

   ```yaml-rasa
   rules:
   - rule: fixes incomplete rule
     steps:
     - intent: search_venues
     - action: action_search_venues
     - slot_was_set:
       - venues: null
     - action: utter_venues_not_found
   ```

   If you want the slot setting to be handled by a different rule or story,
   you should add `wait_for_user_input: false` to the end of the rule snippet:

   ```yaml-rasa
   rules:
   - rule: incomplete rule
     steps:
     - intent: search_venues
     - action: action_search_venues
     wait_for_user_input: false
   ```

   After training, the RulePolicy will check that none of the rules or stories contradict
   each other. The following snippet is an example of two contradicting rules:

    ```yaml-rasa
    rules:
    - rule: Chitchat
      steps:
      - intent: chitchat
      - action: utter_chitchat

    - rule: Greet instead of chitchat
      steps:
      - intent: chitchat
      - action: utter_greet  # `utter_greet` contradicts `utter_chitchat` from the rule above
    ```
 * `restrict_rules` (default: `true`): Rules are restricted to one user turn, but
    there can be multiple bot events, including e.g. a form being filled and its subsequent submission.
    Changing this parameter to `false` may result in unexpected behavior.

  :::caution Overusing rules
    Overusing rules for purposes outside of the [recommended use cases](rules.mdx)
    will make it very hard to maintain your assistant as the complexity grows.

  :::

## Configuring Policies

### Max History

One important hyperparameter for Rasa Open Source policies is the `max_history`.
This controls how much dialogue history the model looks at to decide which
action to take next.

You can set the `max_history` by passing it to your policy
in the policy configuration in your `config.yml`.
The default value is `None`, which means that the complete dialogue history since session
restart is taken in the account.

```yaml-rasa title="config.yml" {3}
policies:
  - name: TEDPolicy
    max_history: 5
    epochs: 200
    batch_size: 50
    max_training_samples: 300
```

:::note
`RulePolicy` doesn't have max history parameter, it always consider the full length
of provided rules. Please see [Rules](./rules.mdx) for further information.
:::

As an example, let's say you have an `out_of_scope` intent which
describes off-topic user messages. If your bot sees this intent multiple
times in a row, you might want to tell the user what you can help them
with. So your story might look like this:

```yaml-rasa
stories:
  - story: utter help after 2 fallbacks
    steps:
    - intent: out_of_scope
    - action: utter_default
    - intent: out_of_scope
    - action: utter_default
    - intent: out_of_scope
    - action: utter_help_message
```

For your model to learn this pattern, the `max_history`
has to be at least 4.

If you increase your `max_history`, your model will become bigger and
training will take longer. If you have some information that should
affect the dialogue very far into the future, you should store it as a
slot. Slot information is always available for every featurizer.

### Data Augmentation

When you train a model, Rasa Open Source will create
longer stories by randomly combining
the ones in your stories files.
Take the stories below as an example:

```yaml-rasa
stories:
  - story: thank
    steps:
    - intent: thankyou
    - action: utter_youarewelcome
  - story: say goodbye
    steps:
    - intent: goodbye
    - action: utter_goodbye
```

You actually want to teach your policy to **ignore** the dialogue history
when it isn't relevant and to respond with the same action no matter
what happened before.

You can alter this behavior with the `--augmentation` flag,
which allows you to set the `augmentation_factor`.
The `augmentation_factor` determines how many augmented stories are
subsampled during training. The augmented stories are subsampled before training
since their number can quickly become very large, and you want to limit it.
The number of sampled stories is `augmentation_factor` x10.
By default augmentation is set to 20, resulting in a maximum of 200 augmented stories.

`--augmentation 0` disables all augmentation behavior.
The memoization based policies are not affected by augmentation
(independent of the `augmentation_factor`) and will automatically
ignore all augmented stories.

### Featurizers

In order to apply machine learning algorithms to conversational AI, you need
to build up vector representations of conversations.

Each story corresponds to a tracker which consists of the states of the
conversation just before each action was taken.

#### State Featurizers

Every event in a trackers history creates a new state (e.g. running a bot
action, receiving a user message, setting slots). Featurizing a single state
of the tracker has two steps:

1. **Tracker provides a bag of active features**:

    * features indicating intents and entities, if this is the first
     state in a turn, e.g. it's the first action we will take after
     parsing the user's message. (e.g.
     `[intent_restaurant_search, entity_cuisine]` )

    * features indicating which slots are currently defined, e.g.
     `slot_location` if the user previously mentioned the area
     they're searching for restaurants.

    * features indicating the results of any API calls stored in
     slots, e.g. `slot_matches`

    * features indicating what the last bot action or bot utterance was (e.g.
     `prev_action_listen`)

    * features indicating if any loop is active and which one

2. **Convert all the features into numeric vectors**:

    `SingleStateFeaturizer` uses the Rasa NLU pipeline to convert the intent and
    bot action names or bot utterances into numeric vectors.
    See the [NLU Model Configuration](./model-configuration.mdx) documentation
    for the details on how to configure Rasa NLU pipeline.

    Entities, slots and active loops are featurized as one-hot encodings
    to indicate their presence.

:::note
If the domain defines the possible `actions`,
`[ActionGreet, ActionGoodbye]`,
4 additional default actions are added:
`[ActionListen(), ActionRestart(),
ActionDefaultFallback(), ActionDeactivateForm()]`.
Therefore, label `0` indicates default action listen, label `1`
default restart, label `2` a greeting and `3` indicates goodbye.

:::

#### Tracker Featurizers

A policy can be trained to learn two kinds of labels -

1. The next most appropriate action to be triggered by the assistant. For example, [`TEDPolicy`](#ted-policy) is trained to do this.
2. The next most likely intent that a user can express. For example, [`UnexpecTEDIntentPolicy`](#unexpected-intent-policy) is trained to learn this.

Hence, a tracker can be featurized to learn one of the labels mentioned above.
Depending on the policy, the target labels correspond to bot actions or bot utterances
represented as an index in a list of all possible actions or
set of intents represented as an index in a list of all possible intents.

Tracker Featurizers come in three different flavours:

##### 1. Full Dialogue

`FullDialogueTrackerFeaturizer` creates a numerical representation of
stories to feed to a recurrent neural network where the whole dialogue
is fed to a network and the gradient is backpropagated from all time steps.
The target label is the most appropriate bot action or bot utterance which should be triggered in the
context of the conversation.
The `TrackerFeaturizer` iterates over tracker
states and calls a `SingleStateFeaturizer` for each state to create numeric input features for a policy.


##### 2. Max History

`MaxHistoryTrackerFeaturizer` operates very similarly to `FullDialogueTrackerFeaturizer` as
it creates an array of previous tracker states for each bot action or bot utterance but with the parameter
`max_history` defining how many states go into each row of input features.
If `max_history` is not specified, the algorithm takes
the whole length of a dialogue into account.
Deduplication is performed to filter out duplicated turns (bot actions
or bot utterances) in terms of their previous states.

For some algorithms a flat feature vector is needed, so input features
should be reshaped to `(num_unique_turns, max_history * num_input_features)`.

##### 3. Intent Max History

`IntentMaxHistoryTrackerFeaturizer` inherits from `MaxHistoryTrackerFeaturizer`. Since, it is used by
[`UnexpecTEDIntentPolicy`](#unexpected-intent-policy), the target labels that it creates are the intents that can be
expressed by a user in the context of a conversation tracker. Unlike
other tracker featurizers, there can be multiple target labels. Hence, it pads the
list of target labels with a constant value (`-1`) on the right to return an equally sized list of target labels
for each input conversation tracker.

Just like `MaxHistoryTrackerFeaturizer`, it also performs deduplication to
filter out duplicated turns. However, it yields one featurized tracker per correct intent
for the corresponding tracker. For example, if the correct labels for an input conversation tracker have the following
indices - `[0, 2, 4]`, then the featurizer will yield three pairs of featurized trackers and target labels.
The featurized trackers will be identical to each other but the target labels in each pair will be
`[0, 2, 4]`, `[4, 0, 2]`, `[2, 4, 0]`.


## Custom Policies

You can also write custom policies and reference them in your configuration. In the example below, the
last two lines show how to use a custom policy class and pass arguments to it.

```yaml-rasa {9-10}
policies:
  - name: "TEDPolicy"
    max_history: 5
    epochs: 200
  - name: "RulePolicy"
  - name: "path.to.your.policy.class"
    arg1: "..."
```
